The group were unsure of what the p value meant when relating it back to the null hypothesis and another lecture on interpreting chi squared results would definitely be useful.


Black
Red
Orange
Yellow
Green
Grass A
6
0
6
7
1
Grass B
5
9
1
3
2
Grass C
5
4
4
5
2
Wood A
7
3
3
6
1
Wood B
5
3
4
5
3
Wood C
6
0
6
5
3

In this table Yellow and Green were not grouped together. This posed a problem as the degrees of freedom (20) is greater than the sum of green jelly babies so the statistical test would not be accurate.


Black
Red
Orange
Yellow/Green
Grass A
6
0
6
8

Grass B
5
9
1
5

Grass C
5
4
4
7

Wood A
7
3
3
7

Wood B
5
3
4
8

Wood C
6
0
6
8

Grouping Yellow and Green solves this problem by decreasing the degrees of freedom to 15 and discounting the problem of the frequency of green jelly babies.

The null hypothesis is that there will be no significant variation from the expected values i.e. no significant difference in the population frequencies between samples.

Is variation present?

The group decided to compare the proportions of polymorphisms across a sample. If the p value is 0.05 then there is a 5% chance of getting a chi squared value equal to or greater if the null hypothesis is true. The results will be significant if the p value is less than or equal to 0.05.

Below are the results across all the samples, in every sample the p value is greater than 0.05. It is unlikely that the null hypothesis is true and that there is a difference between the values. This means the results suggest sampling error, but the same result could also be produced through the action of genetic drift, selection and gene flow which would prompt further investigation as variation is present.
If there is a low P value then there is not much deviation from the expected values between the polymorphisms.


GRASS A
Black
Red
Orange
Y/G
Totals
observed
6
0
6
8
20
expected
5
5
5
5
20
O - E
1
-5
1
3

(O - E)^2
1
25
1
9
chi^2 =
((O - E)^2)/E
0.2
5
0.2
1.8
7.2





p-value =





0.0658

GRASS B
Black
Red
Orange
Y/G
Totals
observed
5
9
1
5
20
expected
5
5
5
5
20
O - E
0
4
-4
0

(O - E)^2
0
16
16
0
chi^2 =
((O - E)^2)/E
0
3.2
3.2
0
6.4





p-value =





0.0937

GRASS C
Black
Red
Orange
Y/G
Totals
observed
5
4
4
7
20
expected
5
5
5
5
20
O - E
0
-1
-1
2

(O - E)^2
0
1
1
4
chi^2 =
((O - E)^2)/E
0
0.2
0.2
0.8
1.2





p-value =





0.753

WOOD A
Black
Red
Orange
Y/G
Totals
observed
7
3
3
7
20
expected
5
5
5
5
20
O - E
2
-2
-2
2

(O - E)^2
4
4
4
4
chi^2 =
((O - E)^2)/E
0.8
0.8
0.8
0.8
3.2





p-value =





0.3618

WOOD B
Black
Red
Orange
Y/G
Totals
observed
5
3
4
8
20
expected
5
5
5
5
20
O - E
0
-2
-1
3

(O - E)^2
0
4
1
9
chi^2 =
((O - E)^2)/E
0
0.8
0.2
1.8
2.8





p-value =





0.4235

WOOD C
Black
Red
Orange
Y/G
Totals
observed
6
0
6
8
20
expected
5
5
5
5
20
O - E
1
-5
1
3

(O - E)^2
1
25
1
9
chi^2 =
((O - E)^2)/E
0.2
5
0.2
1.8
7.2





p-value =





0.0658
None of the p values above are below 0.05, therefore it can be suggested that there is deviation and some variation between the samples. Whether this is due to genetic drift, sampling error, gene flow or selection is unclear at this point.

Is genetic drift present?

If there is selection or gene flow taking place the variation of each polymorphism between the separate samples will be low and a P value of <0.05, the null hypothesis that there is no significant difference between the values is more likely to be true. If genetic drift is the predominant factor in causing variation we would expect P values >0.05, however this could also be due to sampling error.

Separate environments GRASS




Black
Red
Orange
Yellow/Green
Totals
EXPECTED
5.333333
4.333333
3.666667
6.666667

O - E
0.666667
-4.33333
2.333333
1.333333


-0.33333
4.666667
-2.66667
-1.66667


-0.33333
-0.33333
0.333333
0.333333

(O - E)^2
0.444444
18.77778
5.444444
1.777778


0.111111
21.77778
7.111111
2.777778


0.111111
0.111111
0.111111
0.111111

((O - E)^2)/E
0.083333
4.333333
1.484848
0.266667


0.020833
5.025641
1.939394
0.416667


0.020833
0.025641
0.030303
0.016667

DoF = 6





Chi ^ 2
0.125
9.384615
3.454545
0.7
13.664
P value
0.94
0.0091
0.1777
0.7046
0.0336












Separate environments WOOD




Black
Red
Orange
Yellow/Green

EXPECTED
6
2
4.333333
7.666667

O - E
1
1
-1.33333
-0.66667


-1
1
-0.33333
0.333333


0
-2
1.666667
0.333333

(O - E)^2
1
1
1.777778
0.444444


1
1
0.111111
0.111111


0
4
2.777778
0.111111

((O - E)^2)/E
0.166667
0.5
0.410256
0.057971


0.166667
0.5
0.025641
0.014493


0
2
0.641026
0.014493

DoF = 6





Chi^2
0.333333
3
1.076923
0.086957
4.497
P value
0.85
0.22
0.58
0.96
0.6097
The results for the grass environment are significant as the p value is under 0.05, this suggests the results are not due to sampling error and more likely to be due to selection or gene flow.

The results for wood may be non significant because the total number of red jelly babies was equal to the degrees of freedom so the statistical test may not be completely accurate. Also the result may be down to sampling error, or genetic drift having a greater effect on polymorphisms than selection.

Is it gene flow?
It has been found that the environment of grass could be experiencing selection or gene flow in a greater amount than genetic drift.
The next table shows p-values for the averages of the raw data in each environment. The idea behind it is by finding a significant difference between the values of the two environments this could suggest selection is taking place. This could also show a reduced role of gene flow (if gene flow were taking place it would be expected that the values would be similar). The null hypothesis is that there is no significant variation from the expected values.

Average Grass
5.333333
4.333333
3.666667
6.666667

Average Wood
6
2
4.333333
7.666667













Comparing average of different environments



Totals

Black
Red
Orange
Yellow/Green

EXPECTED
5.666667
3.166667
4
7.166667

O-E
-0.33333
1.166667
-0.33333
-0.5


0.333333
-1.16667
0.333333
0.5







(O-E)^2
0.111111
1.361111
0.111111
0.25


0.111111
1.361111
0.111111
0.25







((O - E)^2)/E
0.019608
0.429825
0.027778
0.034884


0.019608
0.429825
0.027778
0.034884







DoF = 3





Chi ^ 2
0.039216
0.859649
0.055556
0.069767
1.02395
P value
0.843
0.3538
0.8136
0.7916
0.7954
This P value is very high, this suggests the results are not significant and it is likely the results are due to sampling error (or genetic drift). This means there is large variation from the expected values and gene flow is unlikely to be having an effect (as this would have produced a low p value).

Conclusion
In conclusion, in all the samples there was variation in the polymorphisms which suggests that selection, gene flow or genetic drift is taking place (or sampling error).
Furthermore, it is suggested that selection or gene flow is taking place on a greater scale in the grass environment compared to the wood environment where it is suggested that genetic drift or sampling error is having a greater effect.
Finally, there is large variation between the environments which suggests either selection, genetic drift or sampling error is taking place.
Through deduction, the differences in polymorphisms in grass are likely to be due to selection and the polymorphisms in wood are likely to be due to genetic drift or sampling error. This is because both areas have had gene flow discounted.